Erasure recovery in a distributed storage system

ABSTRACT

A method for coding (k, r) data and a method for reconstructing data are provided. The coding method includes steps consisting in: dividing an initial datum a into k data blocks ai; grouping the k data blocks into r−1 subsets Sj of data blocks; generating, for each subset Sj, a linear function gj(a) defined as a linear combination of the data blocks assigned to said subset Sj; and generating r parity functions comprising a primary parity function f0(a) as a linear combination of the k data blocks ai, and r−1 secondary parity functions, each secondary parity function fj(a) being defined as the sum of the primary parity function f0(a) and of a linear function gj(a).

FIELD OF THE INVENTION

The invention lies in the general field of data storage systems and the management of failures in these systems. More particularly, the invention relates to methods and devices for coding and decoding data that make it possible to optimize the data read/write operations and to optimize the volume of data to be transferred, necessary for the repairing of a storage system following a failure.

STATE OF THE ART

In recent years, the volume of digital data has increased considerably, because of the use of applications such as social networks, email, file sharing, and video sharing. Thus, companies that offer online services have to manage data volumes of several tens of petabytes (1 PB=10¹⁵ bytes). These quantities of data, which continue to increase exponentially year on year, need to be stored reliably.

Distributed storage systems (DSS) are widely used these days to provide open-ended and reliable storage solutions. These architectures, as illustrated in FIG. 1, comprise multiple storage nodes (110, 112, 114) in which data are stored in a distributed fashion. A storage node can be composed of one (112) or of a plurality of storage devices (110, 114). Each storage device of a node which comprises several thereof is itself designated as a storage node. Generally, the storage devices are storage discs. A controller (102) makes it possible to manage communications from/to the nodes, including the disc read/write operations based on requests received from a computer (106) coupled to the communication network (104).

Now, it is not uncommon for discs, devices or storage nodes to fail, and it is commonplace, for thousands of nodes, for several tens of failures on average to be reported per day. A distributed storage system has to ensure the protection of the data in the case where one or several nodes or storage devices or discs fail.

In order to ensure the reliability and the availability of the data, the standard implementations incorporate solutions based on the redundancy of the data in order to be able to retrieve the data lost in the event of failure.

There are several methods, listed hereinbelow, that make it possible to add reliability to a storage system:

Replication is one of the methods most widely used. It consists in replicating one and the same data block over several storage nodes. Generally, one and the same data block is stored in n different nodes, which makes it possible to tolerate n−1 failures. In practice, each data block is stored in three different nodes making it possible to tolerate two failures. Even though replication offers a good tolerance to failures, it has the disadvantage of posting a storage overhead, which can be assessed at 200% in replications with 3 nodes.

Erasure Code. This approach is based on the use of the well-known Reed-Solomon (RS) codes. Each data file is broken down into k data blocks to which are added r parity function blocks generating k+r data blocks, and the term code RS(k,r) is used. The k+r data blocks are stored in k+r different nodes, in which k nodes store the k original blocks and r parity nodes store the r parity function blocks. A code RS(k,r) has two basic properties: any data block k out of the k+r blocks is sufficient to regenerate the original datum, and r failures, whatever they may be, can be tolerated. In practice, with a code RS(10,4), each data file is broken down into 10 blocks, and 4 parity function blocks are generated, making it possible to tolerate up to four failures. The RS codes offer a better storage efficiency compared to replication, the storage overhead being assessed at 40% for a code RS(10,4).

In a data storage system, in case of failure, reconstructing the data entails reading the data stored in the redundancy nodes, then transferring them in the network from these nodes supplying data, also called “providers”, into a new storage node, also called “newcomer”. Even if, in the event of failure of a single node, there is need for only a single provider in the case of replication, k providers are required in the case of a code RS(k,r), then involving the transfer over the network of the data from the k providers. The repair of a single failure can thus entail reading and transmitting hundreds of terabytes (TB). Also, the cost of the repair in terms of bandwidth and of reads/writes can be significant.

Some solutions listed hereinbelow address reducing the bandwidth required during the repair phase, or “bandwidth repair”.

Regenerating codes are one example. The main idea as described in the article “Network coding for distributed storage systems”, Alexandros G Dimakis, P Brighten Godfrey, Yunnan Wu, Martin J Wainwright, and Kannan Ramchandran, IEEE Transactions on Information Theory, 56(9): 4539-4551, 2010, is that each node generates a linear combination of data from the data blocks which are available to it. The quantity of data to be transferred from each provider node is then lower than that stored in each node, which reduces the bandwidth required for the repair. However, with a regenerating code, the information retrieved in the new node is a linear combination of the transferred data, and it can then not necessarily be equal to the data block lost.

Exact regenerating codes are proposed for retrieving exactly the same lost data block. The main idea as described in the article “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction”, Rashmi, Korlakai Vinayak, Nihar B. Shah, and P. Vijay Kumar, IEEE Transactions on Information Theory 57(8): 5227-5239, 2011, consists in eliminating the unwanted data blocks by using the concept of interference alignment. Even though the exact regenerating codes make it possible to considerably reduce the bandwidth for the repair, the cost of the disc inputs/outputs (“I/O overhead”) is not optimized. Indeed, to repair a single failure, an exact regenerating code requires all the data stored in all the nodes to be read.

Solutions are proposed for reducing both the repair bandwidth and the cost of the disc inputs/outputs.

For example, with the hierarchical codes as presented in the article “Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems”, Alessandro Duminuco and Ernst Biersack, P2P′08, 8th IEEE International Conference on Peer-to-Peer Computing, pages 89-98, the main idea consists in hi-erarchically creating specific parity functions from smaller codes than the original code. A hierarchical code (k,r) needs └k/r−1┘ (integer part of k/r−1) to k data blocks read and downloaded to repair a single failure. However, the hierarchical code does not maintain all the properties of a conventional RS code. Indeed, a hierarchical code (k,r) cannot tolerate any r failures. It therefore proves less reliable with equal redundancy.

The hitchhiker code is a code which makes it possible to maintain the same properties as an RS code and to offer a failure repair that is less costly in terms of bandwidth and of disc inputs/outputs. The main idea as described in “A hitchhikers guide to fast and efficient data reconstruction in erasure-coded data centers”, KV Rashmi, Nihar B Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran, ACM SIGCOMM Computer Communication Review, 44(4):331-342, 2015, consists in reducing the number of data blocks needed to repair a single failure. The hitchhiker approach considers two RS subcodes having the same configuration (k,r). Specific parity functions which depend on the second code and on the first code are created in the second RS subcode. These interleaved parity functions make it possible to reduce the quantity of data read and transferred to repair a data block. A hitchhiker code (10,4) makes it possible to reduce by 30% to 35% the cost of the repair bandwidth and of the disc inputs/outputs.

The following references propose data coding methods for reducing the bandwidth:

Cheng Huang et al.: “Erasure Coding in Windows Azure Storage”, USENIX, 11 Apr. 2013 (2013-04-11), pages 1-12, XP061013941. This LRC (k, l, r) method is based on the generation of local parity functions stored in storage nodes; however, the LRC codes introduce a storage overhead due to the additional storage of the local parity functions.

Cheng Huang et al.: “Pyramid Codes: Flexible Schemes to Trade Spaces for Access Efficiency in Reliable Data Storage systems”, Proc. Sixth IEEE International Symposium on Network Computing and Applications 2007, IEEE, 1 Jul. 2007 (2007-01-07), pages 79-86, XP031119281, ISBN: 978-0-7695-2922-6. This method presents an alternative to the LRC method. However, the pyramid codes, like the LRC codes, introduce a storage overhead compared to a conventional erasure code and to the Babylon code.

These approaches do not allow for the same storage efficiency as a conventional erasure code, contrary to the hitchhiker codes, to the regenerating codes and to the hierarchical codes, which make it possible to reduce the bandwidth while maintaining the same storage efficiency as a conventional erasure code.

Contrary to the methods described previously based on the definition of a new erasure code, the solution proposed in the patent application US 2016/211869 A1 by Blaum Mario et al. proposes a dynamic storage system incorporating several erasure codes, capable of switching between different codes as a function of the objective. The system decides either to reduce the repair bandwidth, or to reduce the storage overhead as a function of the popularity of the file. The system supports in particular the two codes: LRC (cited previously) and “Product Code” (PC).

It emerges that there is a need for a data coding/decoding solution for reconstructing data lost or damaged in the event of a failure of one or more data storage elements, which is enhanced in terms of repair bandwidth cost and of disc input/output cost, while maintaining the same storage efficiency as the conventional erasure codes.

Moreover, with the advent of SSD (Solid State Drive) discs, the storage elements are increasingly faster, and each disc can generate several tens of Gigabits per second in read mode. The bit rate of the interfaces linking the storage elements is then particularly critical and becomes a bottleneck in the storage systems. Reducing the flows to be exchanged in the event of reconstruction is therefore particularly important. There is thus the need for a solution that makes it possible to optimize the quantity of data to be transferred when repairing a storage system, following a failure.

The present invention addresses the abovementioned various needs.

SUMMARY OF THE INVENTION

The present invention aims to mitigate the limitations of the known techniques by proposing methods and devices for coding and decoding data that make it possible to optimize the quantity of data transferred, as well as the read/write operations needed for the repair of a storage system, following a failure.

Advantageously, the present invention makes it possible to reduce both the repair bandwidth and the disc input/output costs while maintaining properties of storage efficiency and of reliability in which all the failures r out of k+r are tolerated.

One object of the present invention relates to a data coding method. In particular, it proposes a method for coding (k, r) data comprising steps consisting in:

-   -   dividing an initial datum a into k data blocks a_(i);     -   grouping the k data blocks into r−1 subsets S_(j) of data         blocks;     -   generating, for each subset S_(j), a linear function g_(j)(a)         defined as a linear combination of the data blocks assigned to         said subset S_(j), and     -   generating r parity functions comprising a primary parity         function f₀(a) as a linear combination of the k data blocks         a_(i), and r−1 secondary parity functions, each secondary parity         function f_(j)(a) being defined as the sum of the primary parity         function f₀(a) and of a linear function g_(j)(a).

Solutions are proposed for reducing both the repair bandwidth and the disc input/output cost.

For example, with the hierarchical codes as presented in the article “Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems”, Alessandro Duminuco and Ernst Biersack, P2P′08, 8th IEEE International Conference on Peer-to-Peer Computing, pages 89-98, the main idea consists in hierarchically creating specific parity functions from smaller codes than the original code. A hierarchical code (k,r) needs └k/r−1┘ (integer part of k/r−1) to k data blocks read and downloaded to repair a single failure. However, the hierarchical code does not maintain all the properties of a conventional RS code. Indeed, a hierarchical code (k,r) cannot tolerate any r failures. It therefore proves less reliable with equal redundancy.

The hitchhiker code is a code which makes it possible to maintain the same properties as an RS code and to offer a failure repair that is less costly in terms of bandwidth and of disc inputs/outputs. The main idea as described in “A hitchhikers guide to fast and efficient data reconstruction in erasure-coded data centers”, KV Rashmi, Nihar B Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran, ACM SIGCOMM Computer Communication Review, 44(4):331-342, 2015, consists in reducing the number of data blocks needed to repair a single failure. The hitchhiker approach considers two RS subcodes having the same configuration (k,r). Specific parity functions which depend on the second code and on the first code are created in the second RS subcode. These interleaved parity functions make it possible to reduce the quantity of data read and transferred to repair a data block. A hitchhiker code (10,4) makes it possible to reduce by 30% to 35% the cost of the repair bandwidth and of the disc inputs/outputs.

The following references propose data coding methods for reducing the bandwidth:

Cheng Huang et al.: “Erasure Coding in Windows Azure Storage”, USENIX, 11 Apr. 2013 (2013-04-11), pages 1-12, XP061013941. This LRC (k, l, r) method is based on the generation of local parity functions stored in storage nodes; however, the LRC codes introduce a storage overhead due to the additional storage of the local parity functions.

Cheng Huang et al.: “Pyramid Codes: Flexible Schemes to Trade Spaces for Access Efficiency in Reliable Data Storage systems”, Proc. Sixth IEEE International Symposium on Network Computing and Applications 2007, IEEE, 1 Jul. 2007 (2007-01-07), pages 79-86, XP031119281, ISBN: 978-0-7695-2922-6. This method presents an alternative to the LRC method. However, the pyramid codes, like the LRC codes, introduce a storage overhead compared to a conventional erasure code and to the Babylon code.

These approaches do not allow for the same storage efficiency as a conventional erasure code, contrary to the hitchhiker codes, to the regenerating codes and to the hierarchical codes, which make it possible to reduce the bandwidth while maintaining the same storage efficiency as a conventional erasure code.

Contrary to the methods described previously based on the definition of a new erasure code, the solution proposed in the patent application US 2016/211869 A1 by Blaum Mario et al. proposes a dynamic storage system incorporating several erasure codes, capable of switching between different codes as a function of the objective. The system decides either to reduce the repair bandwidth, or to reduce the storage overhead as a function of the popularity of the file. The system supports in particular the two codes: LRC (cited previously) and “Product Code” (PC).

It emerges that there is a need for a data coding/decoding solution for reconstructing data lost or damaged in the event of a failure of one or more data storage elements, which is enhanced in terms of repair bandwidth costs and of disc input/output cost, while maintaining the same storage efficiency as the conventional erasure codes.

According to alternative or combined embodiments of the coding method:

-   -   the step of generating r parity functions consists in generating         r encoding vectors comprising:     -   a primary encoding vector C₀ composed of k primary encoding         coefficients c_(0,1) to c_(0,k); and     -   r−1 secondary encoding vectors C_(j) (1<=j<=r−1), each secondary         encoding vector being composed of k secondary encoding         coefficients c_(j,1) to c_(j,k);     -   each primary encoding coefficient c_(0,i) corresponds to a         non-zero random number, and a secondary encoding coefficient         c_(j,i) corresponds to a non-zero or zero random number         depending on whether the data block a_(i) belongs to or does not         belong to the subset S_(j);     -   the step of generating a linear function g_(j)(a) consists in         generating a linear combination of the k data blocks (a₁, . . .         , a_(k)) and of the secondary encoding vector C_(j);     -   k=10 and r=4.

The data coding method of the present invention is particularly suited for distributing data over storage devices for the purpose of reconstructing data lost following a failure of a storage device. Also claimed is a method for storing k+r data comprising at least one step consisting in storing, in k systematic storage devices N_(i), i∈{1, . . . , k}, k data blocks a_(i), and in storing, in r parity storage devices N_(i), i∈{k+1, k+r}, r parity functions, the r parity functions being obtained according to the data coding method claimed.

The invention covers also a data coding device comprising means for implementing the steps of the data coding method as claimed.

Another object of the present invention relates to a method for reconstructing or duplicating the content of a device which stores data coded according to the coding method claimed. In particular, a method is claimed for reconstructing the content of a storage device out of a plurality of storage devices storing k data blocks as in k systematic storage devices N_(i), i∈{1, . . . , k}, and r parity functions f_(j)(a) in r parity storage devices N_(i), i∈{k+1, . . . , k+r}, said k data blocks and r parity functions being obtained according to the data cooling method claimed, the reconstruction method comprising at least steps consisting in:

-   -   determining whether the storage device for which to reconstruct         the content is a systematic or parity storage device;     -   if said device is a systematic storage device N_(i), i∈{1, . . .         , k}:         -   determining to which subset of data S_(j) the data block of             said device belongs;         -   retrieving the primary and secondary parity functions stored             in the parity storage devices k+1 and k+1+j;         -   computing the linear function g_(j)(a) associated with the             subset S_(j);         -   retrieving the data blocks of the subset S_(j) except for             the block a_(i); and         -   reconstructing the content of said systematic storage device             N_(i), i∈{1, . . . , k};     -   if said device is a parity storage device N_(i), i∈{k+1, . . . ,         k+r}, determining the index i identifying said storage device;         -   if the index i is equal to k+1:             -   retrieving the secondary parity function f₀(a) stored in                 the parity storage device of index i=k+2 and the data                 blocks a_(i) of the subset S_(i−k+1);             -   computing the linear function g₁(a) associated with the                 subset S₁ and decoding the primary parity function                 f₀(a); and             -   reconstructing the content of said parity device of                 index i=k+1;         -   if the index i is greater than k+1:             -   retrieving the primary parity function f₀(a) stored in                 the parity storage device of index i=k+1 and the data                 blocks a_(i) of the subset S_(i−k+1;)             -   computing the linear function g_(i−k+1)(a) associated                 with the subset S_(i−k+1) and decoding the secondary                 parity function f_(i−k+1)(a); and             -   reconstructing the content of said parity device of                 index i>k+1.

The content reconstruction method is particularly suited to the reconstruction of content for data lost following a failure of a storage device.

According to alternative or combined embodiments of the reconstruction method:

-   -   the steps of retrieving the parity functions consist in         transferring the data from the corresponding storage devices to         a data collecting device;     -   the step of computing the linear function g_(j)(a), associated         with the subset S_(j) for the reconstruction of a systematic         node i such that i∈{1, . . . , k}, consists in subtracting the         primary parity function stored in the storage device N_(k+1)         from the secondary parity function stored in the storage device         N_(k+1+j);     -   the step of computing the linear function g₁(a), associated with         the subset S₁ for the reconstruction of a parity node i such         that i=k+1, consists in generating a linear combination from the         data blocks of the subset S₁ and from the encoding vector C₁;     -   the step of computing the linear function g_(j)(a), associated         with the subset S_(j) for the reconstruction of a parity node i         such that i∈{k+2, . . . , k+r}, consists in generating a linear         combination from the data blocks of the subset S_(j) and from         the encoding vector C_(j); the step of reconstructing the         content of said systematic storage device consists in decoding         the data block a_(i) from the linear function g_(j)(a);     -   the step of reconstructing the content of said parity storage         device i such that i=k+1 consists in decoding the secondary         parity function of said parity device of index i=k+1 from the         computed linear function g₁(a);     -   the step of reconstructing the content of said parity storage         device i such that i>k+1 consists in decoding the secondary         parity function of said parity device of index i>k+1 from the         computed linear function g_(i−k+1)(a).

The invention also covers a device for reconstructing the content of a storage device out of a plurality of devices storing k data blocks as in k systematic storage devices N_(i), i∈{1, . . . , k}, and r parity functions f_(j)(a) in r parity storage devices N_(i), i∈{k+1, . . . , k+r}, said k data blocks and r parity functions being obtained according to the coding method claimed, the device comprising means for implementing the steps of the claimed method for reconstructing the content of a storage device.

The present invention can be incorporated in hardware and/or software elements such as an integrated circuit of ASIC or FPGA type.

The invention relates also to a computer program which comprises coding instructions making it possible to perform the steps of the coding method claimed and/or comprising code instructions making it possible to perform the steps of the content reconstruction method claimed, when said program is run on a computer.

The invention can be available on a storage medium that can be read by a processor on which is stored a program comprising code instructions for executing the coding method and/or the content reconstruction method as claimed.

One advantageous use of the content reconstruction method claimed is in a distributed storage system which comprises a plurality of data storage nodes capable of storing data coded according to the coding method claimed.

Another advantageous use of the content reconstruction method claimed is in a local storage system which comprises data storage discs capable of storing data coded according to the coding method claimed.

DESCRIPTION OF THE FIGURES

Different aspects and advantages of the invention will emerge with the support of the description of a preferred but nonlimiting implementation of the invention, with reference to the figures below:

FIG. 1 illustrates a distributed system environment that makes it possible to implement the invention in an embodiment;

FIG. 2 shows a representation of the storage of the data in an embodiment of the invention;

FIG. 3 illustrates the steps of the coding method of the invention in an embodiment;

FIG. 4 illustrates the generation of a primary encoding vector, according to the principles of the invention;

FIG. 5 illustrates the generation of a secondary encoding vector, according to the principles of the invention;

FIG. 6 illustrates the generation of a primary parity function and of a secondary parity function, according to the principles of the invention;

FIGS. 7a to 7c illustrate the steps of reconstruction of systematic and parity nodes according to the principles of the invention;

FIG. 8 is a table comparing the performance levels of different codes.

DETAILED DESCRIPTION OF THE INVENTION

Throughout the rest of the description, the expression ‘storage node’ is used in the generic sense and denotes a storage entity, whether it be a single disc or a group of storage devices forming a node. The embodiments of the invention described apply to any variant of a distributed storage system illustrated in FIG. 1, which is only an example of an environment that makes it possible to implement the invention. Thus, this example is not limiting and the principles described can be applied for any other environment incorporating a storage system, distributed or not, such as cloud systems, data centres, distributed file systems, whether based on files or on objects. The invention can also be applied to non-distributed storage systems such as “disk arrays”. Moreover, a storage node which stores data derived from the initial datum is designated “systematic node” and a storage node which stores a parity function is designated “parity node”.

To facilitate the understanding of the principles of the invention, the coding method is first of all described on a particular example of a code (k=10, r=4), then is generalized for any code (k, r).

Description of the coding method on the example of a code (10, 4):

FIG. 2 illustrates a representation of the storage of data by application of the coding method of the invention, based on the example of a code k=10 and r=4. Hereinafter in the description, the coding steps of the present invention can be designated by the expression “Babylon code” or “Babylon coding”. An initial datum a is divided into k=10 pieces (200) or data blocks (a₁, . . . , a₁₀), denoted a_(i), i∈{1, . . . , 10}, to which are added r=4 parity blocks or parity functions f₀, f₁, f₂ and f₃, which are linear combinations of the data (a₁, . . . , a₁₀) and hereinbelow denoted f₀(a), f₁(a), f₂(a), f₃(a) or f₀(a) and f_(j)(a), j∈{1, 2, 3}. The parity function f₀(a) is designated as primary parity function, and the parity functions f_(j)(a) are designated as secondary parity functions. The primary and secondary parity functions are generated according to the Babylon coding (202) of the invention and stored on storage nodes (204).

The systematic data blocks a₁ to a₁₀ are stored on 10 corresponding systematic storage nodes (206), denoted (N₁ to N₁₀), and the primary f₀(a) and secondary f₁(a), f₂(a), f₃(a) parity functions are stored on 4 corresponding parity storage nodes (208), denoted N₁₁ to N₁₄.

FIG. 3 illustrates the steps of the coding method (300) of the invention in an embodiment of a code (k=10, r=4). In a first step (302), the method makes it possible to divide an initial datum a into 10 data blocks (a₁, . . . , a₁₀), then, in a next step (304), to create 3 data subsets (S₁, S₂, S₃). Each data subset comprises different data blocks out of the 10 blocks created. In one embodiment, two subsets (S₁, S₂) are each composed of 3 data blocks and the third subset (S₃) is composed of 4 data blocks. The first subset S₁ is composed of the first three blocks (a₁, a₂, a₃) obtained from the initial datum, the second subset S₂ is composed of the next three blocks (a₄, a₅, a₆), and the last subset S₃ is composed of the last four remaining blocks (a₇, a₈, a₉, a₁₀).

In a next step (306), the method makes it possible to generate 4 encoding vectors of length (k=10), corresponding to the 4 parity nodes. The encoding vectors denoted C₀ to C₄ correspond to a primary encoding vector C₀ and 3 secondary encoding vectors C₁, C₂, C₃.

FIGS. 4 and 5 detail the steps of generation of the encoding vectors according to the principles of the invention.

FIG. 4 illustrates the generation of the primary encoding vector C₀ (400) in an embodiment. The primary encoding vector C₀ is composed of 10 primary encoding coefficients c_(0,1) to c_(0,10) in which each primary encoding coefficient c_(0,1) (402-i) corresponds to a non-zero random number (404-i).

FIG. 5 illustrates the generation of the secondary encoding vectors C_(j) (1<=j<=3) in an embodiment (500). Each secondary encoding vector C_(j) is composed of 10 secondary encoding coefficients c_(j,1) to c_(j,10).

To generate a vector C_(j), the method checks (504-i), for each block a_(i), whether the block a_(i) belongs to the subset S_(j). If such is the case (yes branch), the method makes it possible to associate with the coefficients c_(j,i) non-zero random numbers (506-i). If the blocks as do not belong to the subset S_(j) (no branch), the method makes it possible to associate with the coefficients c_(j,i) zero random numbers (508-i).

For example, to generate the encoding vector C₁, the method checks, for each data block a₁, a₂, . . . a₁₀, whether it belongs to the subset S₁. Since the data blocks a₁, a₂, a₃ belong to the subset S₁, the method therefore generates non-zero random numbers for the coefficients c_(1,1) and c_(1,3). Since the data blocks a₄, . . . , a₁₀ do not belong to S₁, the method generates zero random numbers associated with the coefficients c_(1,4) . . . c_(1,10).

Thus, for the subset S₁, the method makes it possible to generate an encoding vector C₁ equal to c_(1,1), c_(1,2), c_(1,3), 0, 0, 0, 0, 0, 0, 0) in which c_(1,1), c_(1,2) and c_(1,3) are non-zero random numbers.

For the subset S₂, the method makes it possible to generate an encoding vector C₂ equal to (0, 0, 0, c_(2,4), c_(2,5), c_(2,6), 0, 0, 0, 0) in which c_(2,4), c_(2,5) and c_(2,6) are non-zero random numbers.

For the subset S₃, the method makes it possible to generate an encoding vector C₃ equal to (0, 0, 0, 0, 0, 0, c_(3,7), c_(3,8), c_(3,9), c_(3,10)) in which c_(3,7), c_(3,8), c_(3,9) and c_(3,10) are non-zero random numbers.

Returning to FIG. 3, after the step of generation of the encoding vectors, the method makes it possible to generate (308) 4 parity functions as linear combinations of the data blocks (a₁, . . . , a₁₀) comprising a primary parity function f₀(a) and 3 secondary parity functions f₁(a), f₂(a) and f₃(a).

FIG. 6 details the generation of the primary parity function f₀(a), and the secondary parity functions f_(j)(a) j∈{1, 2, 3} according to the principles of the invention. The primary parity function (602) is a linear combination of the 10 data blocks (a₁, . . . a₁₀) and is stored in the parity node N₁₁. Each data block as has an associated encoding coefficient c_(0,i) of the primary encoding vector C₀ (400). The primary parity function f₀(a) can be written according to the following equation:

f ₀(a)=c _(0,1) *a ₁ +C _(0,2) *a ₂ +C _(0,3) *a ₃ +C _(0,4) *a ₄ +C _(0,5) *a ₅ +C _(0,6) *a ₆ +C _(0,7) *a ₇ +C _(0,8) *a ₈ +C _(0,9) *a ₉ +c _(0,10) *a ₁₀

in which the coefficients c_(0,1), c_(0,2), . . . , c_(0,10) are non-zero random numbers.

As illustrated in FIG. 6, to generate the 3 secondary parity functions f_(j)(a) (1<=j<=3) (604), the method first generates 3 linear functions g_(j)(a) (1<=j<=3) (606). Each linear function g_(j)(a) is a linear combination of the 10 data blocks (a₁, . . . , a₁₀) and of the encoding vectors C_(j) (1<=j<=3) (500).

The linear function g₁(a) is obtained by linear combination of the 10 data blocks (a₁, . . . , a₁₀) and of the encoding vector C₁, and it can then be written:

g ₁(a)=c _(1,1) *a ₁ +c _(1,2) *a ₂ +c _(1,3) *a ₃.

The linear function g₂(a) is obtained by linear combination of the 10 data blocks (a₁, . . . , a₁₀) and of the encoding vector C₂, and it can then be written:

g ₂(a)=c _(2,4) *a ₄ +c _(2,5) *a ₅ +c _(2,6) *a ₆.

Similarly, the linear function g₃(a) is obtained by linear combination of the 10 data blocks (a₁, . . . , a₁₀) and of the encoding vector C₃ and can be written:

g ₃(a)=c _(3,7) *a ₇ +c _(3,8) *a ₈ +c _(3,9) *a ₉ +c _(3,10) *a ₁₀.

Once the functions g_(j)(a) (1<=j<=3) are generated, the method makes it possible to create the secondary parity functions f₁(a), f₂(a) and f₃(a). The secondary parity function f₁(a) is obtained by the sum of the primary parity function f₀(a) and of the linear function g₁(a). It can be represented as follows:

f ₁(a)=f ₀(a)+g ₁(a)

Similarly, the secondary parity function f₂(a) is obtained by the sum of the primary parity function f₀(a) and of the linear function g₂(a). It can be represented as follows:

f ₂(a)=f ₀(a)+g ₂(a)

Likewise, the secondary parity function f₃(a) is obtained by the sum of the primary parity function f₀(a) and of the linear function g₃(a). It can be represented as follows:

f ₃(a)=f ₀(a)+g ₃(a)

The secondary parity functions f₁(a), f₂(a), f₃(a) are stored respectively in the parity nodes N₁₂, N₁₃ and N₁₄ (208).

Advantageously, the linear functions g₁(a), g₂(a), g₃(a) and the primary parity function f₀(a) are generated so as to be linearly independent of one another. Moreover, the sum of any two of these functions is also linearly independent of the other functions. Finally, advantageously, since all the data blocks are linearly independent, the Babylon code maintains the same properties as an RS code.

FIGS. 7a to 7c illustrate the steps of detecting failures and of reconstructing the content of systematic nodes and of parity nodes according to the principles of the invention.

As illustrated in FIG. 7a , the method is initiated (702) when information relating to the detection of a failure in the storage system is available indicating the number of failures (704). For the cases where the number of failures is greater than or equal to five, the repair method may not be operative and the system is considered to be defective (706). For the cases where the number of failures lies between two and four failures, the two, three or four nodes that have failed can be reconstructed according to the known repair methods (708), of erasure code or Reed Solomon type.

The method claimed according to the Babylon code works for the cases where a single storage node in a local or distributed storage system has failed. The repair steps will make it possible to retrieve, in the other storage nodes, the lost data which were contained in the failed node and to reconstruct a new storage node with the same data. After a failure is detected, the method determines the nature of the node which has failed (710), namely whether it is a systematic node or a parity node.

In order to facilitate the understanding of the principles of the invention and without limitation, the reconstruction method is first of all described on a particular example of a code (k=10, r=4), then is generalized for any code (k, r).

Description of the decoding method of the invention on the example of a storage system with a code (10, 4):

If the failure relates to a systematic node N_(i), i∈{1, . . . , 10}, that is to say a node storing data blocks a_(i), the reconstruction (712) is done according to the steps of the method described in FIG. 7 b.

If the failure relates to a parity node N_(i), i∈{11, . . . , 14}, that is to say a node storing parity functions, the reconstruction (714) is done according to the steps of the method described in FIG. 7 c.

FIG. 7b illustrates the steps of reconstruction of a systematic node according to the present invention.

To illustrate the method, it is assumed that the block a₁ located in the node N₁ is lost, and that it must be reconstructed. In a first step (716), the method makes it possible to determine the subset of data S_(j), j∈[1,2,3], to which the datum as of the node N_(i) that has failed belongs. In the example, the data block a₁ belongs to the subset S₁, which contains the data (a₁, a₂, a₃). In a next step (718), the method makes it possible to retrieve the data of the primary parity function f₀(a) which are stored in the node N₁₁ and to retrieve the data of the secondary parity function f₁(a) corresponding to the index of the identified subset S₁, which are stored in the node N₁₂. The expression “retrieve the data” is used to denote operations of transferring of the data from a storage node to a data collecting device. The data collecting device can be incorporated in a general storage system controller, where the node reconstruction method can be implemented. The retrieval of the data also includes the operations of reading/writing from/into memory elements of the storage nodes and of the data collecting device.

After the data collection step, the method makes it possible (720), from the retrieved data, to compute the linear function g₁(a) associated with the subset of data S₁. In one embodiment, the linear function g_(j)(a) is obtained by subtraction of the primary parity function f₀(a) located in the node N₁₁ from the secondary parity function f₁(a) located in the node N₁₂.

In a next step (722), the method makes it possible to load the data blocks (a₂, a₃) of the subset S₁, except for the block a₁ which is lost. By subtracting these data from the linear function g₁(a), the datum a₁ is retrieved and the primary storage node N₁ for the lost data block a₁ is reconstructed (724).

The reconstruction of the node for a₁ needs to use only 4 data blocks {a₂, a₃, f₀(a), f₁(a)}, unlike a conventional RS code which would have required 10 blocks, i.e. a saving of 60%. Similarly, the reconstruction of a systematic node for any data block a_(i) for i∈{1, 2, 3}, requires only 4 blocks, by using the data blocks having the index i∈{1, 2, 3}\{i}, that is to say except for the data block of index i corresponding to the node N_(i) that has failed, and by using the data blocks of the primary parity f₀(a) and secondary parity f₁(a) functions.

The reconstruction of a systematic node for any data block a_(i) for i∈{4, 5, 6} requires 4 blocks, by using the data blocks having the index i∈{4, 5, 6}\{i} and by using the data blocks of the primary parity f₀(a) and secondary parity f₂(a) functions.

The reconstruction of a systematic node for any data block a_(i) for i∈{7, 8, 9, 10} requires 5 blocks, by using the data blocks having the index i∈{7, 8, 9, 10}\{i} and by using the data blocks of the primary parity f₀(a) and secondary parity f₃(a) functions.

FIG. 7c illustrates the steps of reconstruction of a parity node according to the present invention.

The nodes N_(i), i∈[11,14] contain the primary f₀(a) and secondary f₁(a), f₂(a), f₃(a) parity functions. The method first of all determines (726) whether the node to be repaired is N₁₁ which contains the primary parity function f₀(a). If yes, the method makes it possible to transfer (728) the data f₁(a) and those of the subset S₁ containing a₁, a₂ and a₃ from the nodes N₁₂, N₁, N₂ and N₃. In a next step (730), the method makes it possible to generate the linear function g₁(a) using the data a₁, a₂ and a₃, then to decode the primary parity function f₀(a) by subtracting the linear function g₁(a) from the parity function f₁(a). The method then makes it possible to reconstruct the node N₁₁.

Like the nodes i, i∈{1, . . . , 10}, the repair of N₁₁ requires only 4 data blocks (f₁(a), a₁, a₂ and a₃).

Returning to the step (726), if the node to be repaired is a parity node N_(i) such that i∈[12,14], the method makes it possible to load (734) the data of the node N₁₁ which contains the parity function f₀(a), and load the data of the subsets S_(i−11). In a next step (736), the method makes it possible to generate the linear function g_(i−11)(a) using the data of the subsets and to decode the parity function f_(i−11)(a) by effecting the sum of the primary parity function f₀(a) and of the linear function g_(i−11)(a). The method then makes it possible to reconstruct (738) the node N_(i), i∈[12,14].

Thus, for the reconstruction of the node N₁₂ containing the secondary parity function f₁(a), the method makes it possible to load the parity function f₀(a) of the node N₁₁ and the data of the subset S₁ (a₁ from the node N₁, a₂ from the node N₂ and a₃ from the node N₃). In a first stage, the parity function g₁(a) is generated using the data a₁, a₂ and a₃ of the subset S₁. Next, the secondary parity function f₁(a) is decoded by effecting the sum of f₀(a) and g₁(a) and the node N₁₂ is reconstructed.

Likewise, for the reconstruction of the node N₁₃ containing the secondary parity function f₂(a), the method makes it possible to load the data of the node N₁₁ which contains the parity function f₀(a) and the data of the subset S₂. In a first stage, the parity function g₂(a) is generated using the data a₄, a₅ and a₆ of the subset S₂. Next, the secondary parity function f₂(a) is decoded by effecting the sum of f₀(a) and g₂(a) and the node N₁₃ is reconstructed.

Finally, for the reconstruction of the node N₁₄ containing the secondary parity function f₃(a), the method makes it possible to load the parity function f₀(a) contained in the node N₁₁ and the data of the subset S₃. In a first stage, the parity function g₃(a) is generated using the data a₇, a₈, a₉, and a₁₀ of the subset S₃. Next, the secondary parity function f₃(a) is decoded by effecting the sum of f₀(a) and g₃(a) and the node N₁₄ is reconstructed.

It should be noted that the repair of the nodes N₁₂ and N₁₃ requires only the transfer of 4 data blocks, respectively {f₀(a), a₁, a₂ and a₃} and {f₀(a), a₄, a₅ and a₆}, and that the repair of the node N₁₄ requires the transfer of 5 data blocks {f₀(a), a₇, a₈, a₉, a₁₀}.

Thus, advantageously, the claimed reconstruction method requires only 4 or 5 data blocks for the reconstruction of any node i∈{1, . . . , 14}. It makes it possible to reduce the bandwidth and the inputs/outputs by 50% to 60% compared to an RS code.

The general principle of the coding method of the invention is now described for any code (k, r):

Starting from a datum a as input of size M, the general principle of the invention consists in dividing the datum into k data blocks (a₁, . . . , a_(k)), each block having a size M/k. Each data block as stored in the node N_(i) is of fixed size m=M/k.

The following notations are suitable for the description below:

-   -   └X┘ expresses the integer part of X; and     -   ┌X┐ expresses the excess integer part of X.

The Babylon coding method begins with a step of division of the k data blocks (a₁, . . . , a_(k)) into (r−1) subsets (S₁, S₂, . . . , S_(r−1)), denoted S_(j), 1≤j≤r−1. If k is a multiple of r−1, the subsets have one and the same size k/r−1. If k is not a multiple of r−1, each of the r−1−q first subsets is composed of t_(f) data blocks, and the following q subsets contain t_(c) data blocks with:

t _(f) =└k/r−1┘,

t _(c) =┌k/r−1┐, q=k mod(r−1)

The first subset S₁ is composed of data blocks belonging to (a₁, . . . , a_(tf)). The j^(th) subset S_(j) is composed of data blocks belonging to (a_((j−1)tf+1), . . . , a_(jtf)), for (1≤j≤r−1).

The second step of the method consists in generating r encoding vectors comprising a primary encoding vector C₀ and r−1 secondary encoding vectors C_(j) (1<=j<=r−1).

The primary encoding vector C₀ is composed of k encoding coefficients c_(0,1) to c_(0,k) in which each primary encoding coefficient c_(0,i) corresponds to a non-zero random number.

A secondary encoding vector C_(j) is composed of k encoding coefficients c_(j,1) to c_(j,k). To generate a secondary encoding vector C_(j), the method checks, for each block a_(i), whether the block a_(i) belongs to the subset S_(j). If such is the case, then the method makes it possible to associate non-zero random numbers with the coefficients c_(j,i), and if the block as does not belong to the subset S_(j), then the method makes it possible to associate zero random numbers with the coefficients c_(j,i).

The structure of the encoding vectors C₀ to C_(r−1) is represented hereinbelow:

c₀ = [c_(0, 1), c_(0, 2), … , c_(0, k)] c₁ = [c_(1, 1), c_(1, 2), c_(1, tf), … , 0, … , 0] c₂ = [0, … , 0, c_(2, tf + 1), … , c_(2, 2tf)0, … , 0] … c_(r − q) = [0, … , 0, c_(r − q, (r − q − 1), tf + 1), … , c_(r − q, (r − q)tc), 0, … , 0] c_(r − q + 1) = [0, … , 0, c_(r − q + 1, (r − q), tc + 1), … , c_(r − q + 1, (r − q), tc), 0, … , 0] … c_(r − 1) = [0, … , … , … , 0, c_(r − 1, rtc + 1), … , c_(r − 1, k)].

After the step of generation of the encoding vectors, the method makes it possible to generate r parity functions, linear combinations of the data (a₁, . . . , a_(k)) and composed of a primary parity function f₀(a) and of r−1 secondary parity functions f₁(a), . . . , f_(r−1)(a).

The primary parity function is a linear combination of the k data blocks (a₁, . . . , a_(k)) and is stored in the parity node of index k+1, N_(k+1). Each data block a_(i) has an associated encoding coefficient c_(0,i) of the primary encoding vector C₀. The primary parity function f₀(a) can be written according to the following equation:

f ₀(a)=c _(0,1) *a ₁ +c _(0,2) *a ₂ + . . . +c _(0,k) *a _(k)

in which the coefficients c_(0,1), c_(0,2), . . . , c_(0,k) are non-zero random numbers.

To generate the r−1 secondary parity functions f_(j)(a) (1<=j<=r−1), the method makes it possible to generate first of all r−1 linear functions g_(j)(a) (1<=j<=r−1). Each linear function g_(j)(a) is a linear combination of the k data blocks (a₁, . . . , a_(k)) and of the encoding vectors C_(j) (1<=j<=r−1). Once the linear functions g_(j)(a) (1<=j<=r−1) are generated, the method makes it possible to create the secondary parity functions f₁(a), f₂(a), . . . f_(r−1)(a) such that a secondary parity function f_(j)(a), 1≤j≤r−1, is obtained by adding the primary parity function f₀(a) to a linear function g_(j)(a):

f _(j)(a)=f ₀(a)+g _(j)(a)

The data blocks and the parity functions generated according to the coding of the invention are stored in storage nodes of a local or distributed storage system. The k data blocks (a₁, . . . , a_(k)) are stored in k systematic nodes N_(i), i∈{1, . . . k} and each of the r parity blocks obtained by a parity function is stored in a parity node N_(i), i∈{k+1, . . . , k+r}, as illustrated in the table below:

Node 1 a₁ . . . . Node k a_(k) Node k + 1 f_(o)(a) Node k + 2 f_(o)(a) + g₁(a) . . . . Node k + r f_(o)(a) + g_(r−1)(a)

Advantageously, the linear functions g_(j)(a), 1≤j≤r−1 and the primary parity function f₀(a) are generated so as to be linearly independent of one another. Moreover, the sum of any two of these functions is also linearly independent of the other functions. Finally, advantageously, all the data blocks being linearly independent, the Babylon code maintains the same properties as an RS code.

The general principle of the reconstruction method of the invention is now described for any code (k, r). The reconstruction method is initiated when information relating to the detection of a failure in the storage system is available. The method can be initiated to duplicate the content of a node. The reconstruction method in the case of a failure works for the cases in which a single storage node in a local or distributed storage system has failed. The repair steps will make it possible to retrieve, in the other storage nodes, the lost data which were contained in the node that has failed and to reconstruct a new storage node with the same data.

When a failure is detected, the method determines the nature of the node which has failed, namely whether it is a systematic node or a parity node.

If the failure relates to a systematic node N_(i), i∈{1, . . . , k}, that is to say a node storing data blocks a_(i), the reconstruction is done according to the case 1 described hereinbelow.

If the failure relates to a parity node N_(i), i∈{k+1, . . . , k+r}, that is to say a node storing parity functions, the reconstruction is done according to the case 2 described hereinbelow.

Case 1: Reconstruction of the Nodes N_(i), i∈{1, . . . k}

Considering a repair of the datum as stored in a node N_(i) such that (1≤i≤k) with a_(i)∈S_(j) and (1≤j<r), the method makes it possible, initially, to transfer the data contained in the nodes of index k+1, N_((k+1)) and of index k+1+j, N_((k+1+j)). Next, the datum f₀(a), stored in the node of index k+1, N_((k+1)), is subtracted from the datum f_(j)(a) stored in the node of index k+1+j, N_((k+1+j)) to recover the linear function g_(j)(a) associated with the subset S_(j). The method then makes it possible to transfer the data of the subset S_(j)\{a_(i)} (that is to say, except for the data block a_(i)). The use of the linear function g_(j)(a) makes it possible to decode the lost datum a_(i). The number of data blocks needed to reconstruct as is therefore (t_(f)+1) if i belongs to the (r−1−q) first subsets. If i belongs to the q last subsets, the total number of data blocks required is (t_(c)+1).

Case 2: Reconstruction of the Nodes N_(i), i∈{k+1, . . . , k+r}

As in the preceding case, for the reconstruction of the node N_(i) such that k+1<i≤k+r, the method makes it possible to transfer the datum f₀(a) stored in the node of index k+1, N_((k+1)), and the data located in the subset S_(i−k+1). The linear function g_(i−k+1)(a) is obtained from the data of the subset S_(i−k+1). Next, the decoding is performed by using the primary function f₀(a) and the linear function g_(i−k+1)(a) to retrieve the secondary function f_(i−k+1)(a) (in effect, f_(i=k+1)(a)=f₀(a)+g_(i−k+1)(a)). It should be noted that if (i−k+1) belongs to the (r−1−q) first blocks, the repair requires (t_(r)+1) data blocks. If (i−k+1)∈{r−q, . . . , r−1}, then the repair requires (t_(c)+1) blocks.

For the particular case of the node of index k+1, N_(k+1) containing the primary parity function f₀(a), the method makes it possible to transfer the first subset S₁ to generate the linear function g₁(a). The method then makes it possible to transfer the data from the node of index k+2, N_(k+2) containing the secondary parity function f₁(a). By subtracting the linear function g₁(a) from the secondary parity function f₁(a), the primary parity function f₀(a) is decoded (in effect, f₀(a)=f₁(a)−g₁(a)). The datum of the node of index k+1, N_(k+1) containing the primary parity function f₀(a), is retrieved by using (t_(f)+1) data blocks.

Generally, the average value of the data transferred and read for the reconstruction/repair, whatever the type of node, is given by the following equation:

$\gamma = {\frac{1}{\left( {k + r} \right)}\left\lbrack {\left( {\left( {{\left( {r - 1 - q} \right) \times t_{j}} + \left( {r - q} \right)} \right) \times \left( {t_{j} + 1} \right)} \right) + \left( {q\left( {L_{c} + 1} \right)}^{2} \right)} \right\rbrack}$

It should be noted that the description takes for the example a value of r>=3. For a value r=1, the Babylon code acts as a conventional RS code. For a value r=2, the data are divided into two subsets S₁ and S₂, in which S₁ is composed of the t_(s) first blocks and S₂ contains the k−t_(s) last blocks, in which t_(s)=└k/┘.

Advantageously, the method of the invention makes it possible to not use all the nodes in the reconstruction process. Moreover, the nodes contributing to the reconstruction do not have to perform computations, but they simply have to transfer their content to a data collector where the decoding operations will be performed. The data collector can be the new storage node replacing the failed node.

Still advantageously, by virtue of the Babylon coding method, the reconstruction time is reduced compared to the methods based on matrices in which all of the data blocks of the k nodes are necessary for the reconstruction.

FIG. 8 is a comparative table of the performance levels comparing the results of different known codes with those obtained by the Babylon code of the present invention. In the table (800), the first column lists the compared codes: RS; regenerators; exact regenerators; hierarchical; hitchhiker; and Babylon.

For all the codes, a file of the same size of M bytes coded using a (k=10, r=4) configuration was considered.

The evaluation was done for each code for one and the same storage efficiency (71%) column (804) on different parameters: the average bandwidth required for the repair (806), the disc inputs/outputs (808) and the tolerance to failures (810). The bandwidth and the disc inputs/outputs are expressed as a percentage relative to the size M of the original file. The percentages in the table are computed by taking account of the repair of the systematic nodes and of the parity nodes.

The results are as follows:

-   -   For an RS code, the repair of a single failure requires M bytes         to be read and transferred. This is equivalent to 100% of the         original size of the file.

For the same storage efficiency, the bandwidth required by the regenerator codes and the exact regenerator codes are computed. The bandwidth used for the repair of a failure is on average equal to 32.5%. However, the disc inputs/outputs are not optimized. To repair a single failure, the regenerator codes require all the data stored in the nodes to be read before they are transferred.

The hierarchical code optimizes both the bandwidth and the disc inputs/outputs. The quantity of the data read and transferred in the network is equal to 33.57%. However, the hierarchical code does not maintain the same tolerance to failures as the RS codes. In effect, it is not possible to tolerate any 4 failures by using the hierarchical code.

The hitchhiker code makes it possible to both optimize the bandwidth and the disc inputs/outputs while maintaining the same tolerance to failures as the RS code. In this case, the quantity of the data read and transferred is equal to 76.42%.

The Babylon code requires on average 43.57% of the size of the original file to repair a single failure in terms of bandwidth and of disc inputs/outputs. It also makes it possible to tolerate any 4 failures out of the 14 nodes.

The present invention can be implemented from hardware and/or software elements. It can be available as a computer program product on a computer-readable medium. The medium can be electronic, magnetic, optical or electro-magnetic. In one embodiment, the method is implemented by computer. A computer program product is described, said computer program comprising code instructions making it possible to perform one or more of the steps of the method, when said program is run on a computer. In one embodiment, the device for implementing the invention comprises a computer-readable storage medium (RAM, ROM, flash memory or another memory technology, for example disc medium or another non-transient computer-readable storage medium) coded with a computer program (that is to say several executable instructions) which, when it is run on one or more processors, performs the functions of the embodiments described previously. As an example of hardware architecture suitable for implementing the invention, a device can comprise a communication bus to which are linked a central processing unit or microprocessor (CPU, acronym for Central Processing Unit), which processor can be “multi-core” or “many-core”; a read-only memory (ROM) that can comprise the programs necessary for the implementation of the invention; a random access memory or cache memory (RAM) comprising registers suitable for saving variables and parameters created and modified during the execution of the abovementioned programs; and a communication or I/O (input/output) interface, suitable for transmitting and receiving data. In the case where the invention is installed on a reprogrammable computation machine (for example an FPGA circuit), the corresponding program (that is to say the sequence of instructions) can be stored in or on a removable storage medium (for example an SD card, or a mass storage such as a hard disc, e.g. an SSD) or non-removable, volatile or non-volatile storage medium, this storage medium being partially or totally readable by a computer or a processor. The computer-readable medium can be transportable or communicable or mobile or transmissible (i.e. by a 2G, 3G, 4G, Wifi, BLE, fibre optic or other telecommunication network). The reference to a computer program which, when it is run, performs any one of the functions described previously, is not limited to an application program running on a single host computer. On the contrary, the terms computer program and software are used here in a general sense to refer to any type of computer code (for example, application software, firmware, microcode, or any other form of computer instruction, such as web services or SOA or via programming interfaces API) which can be used to program one or more processors to implement aspects of the techniques described here. The computer means or resources can in particular be distributed (“cloud computing”), possibly with or according to pair-to-pair and/or virtualization technologies. The software code can be executed on any appropriate processor (for example, a microprocessor) or processor core or a set of processors, whether they are provided in a single computation device or distributed between several computation devices (possibly accessible in the environment of the device). Securing technologies (crypto-processors, authentication, encryption, chip card, etc) can be used. 

1. A method for coding (k, r) data comprising: dividing an initial datum a into k data blocks a_(i) of the same size; grouping the k data blocks into r−1 disconnected subsets S_(j) of data blocks; generating, for each subset S_(j), a linear function g_(i)(a) defined as a linear combination of the data blocks assigned to said subset S_(j); and generating r parity functions comprising a primary parity function f₀(a) as a linear combination of the k data blocks a_(i), and r−1 secondary parity functions, each secondary parity function f_(j)(a) being defined as the sum of the primary parity function f₀(a) and of a linear function g_(i)(a).
 2. The method according to claim 1, wherein the step of generating r parity functions includes generating r encoding vectors comprising: a primary encoding vector C₀ composed of k primary encoding coefficients c_(0,1) to c_(0,k); and r−1 secondary encoding vectors C_(j) (1<=j<=r−1), each secondary encoding vector being composed of k secondary encoding coefficients c_(j,1) to c_(j,k).
 3. The method according to claim 2, wherein each primary encoding coefficient c_(0,i) corresponds to a non-zero random number, and a secondary encoding coefficient c_(j,i) corresponds to a non-zero or zero random number depending on whether the data block a_(i) belongs to or does not belong to the subset S_(j).
 4. The method according to claim 3, wherein the step of generating a linear function g_(i)(a) includes generating a linear combination from the k data blocks (a₁, . . . , a_(k)) and from the k secondary encoding coefficients c_(j,1) to c_(j,k) of the secondary encoding vector C_(j).
 5. The method according to claim 1, wherein k=10 and r=4.
 6. The method for storing k+r data comprising at least one step including storing, in k systematic storage devices N_(i), i∈ {1, . . . , k}, k data blocks a_(i), and in storing, in r parity storage devices N_(i), i∈{k+1, . . . , k+r}, r parity functions, the r parity functions being obtained according to the data coding method of claim
 1. 7. A device for coding data comprising means suitable for implementing the steps of the data coding method according to claim
 1. 8. A method for reconstructing the content of a storage device out of a plurality of devices storing k data blocks a_(i) in k systematic storage devices N_(i), i∈{1, . . . , k}, and r parity functions f_(j)(a) in r parity storage devices N_(i), i∈{k+1, . . . , k+r}, said k data blocks and r parity functions being obtained according to the method of claim 1, the reconstruction method comprising: determining whether the storage device for which to reconstruct the content is a systematic or parity storage device; if said device is a systematic storage device N_(i), i∈{1, . . . , k}: determining to which subset of data S_(j) the data block of said device belongs; retrieving the primary and secondary parity functions stored in the parity storage devices k+1 and k+1+j; computing the linear function g_(j)(a) associated with the subset S_(j); retrieving the data blocks of the subset S_(j) except for the block a_(i); and reconstructing the content of said systematic storage device N_(i), i∈{1, . . . , k}; if said device is a parity storage device N_(i), i∈{k+1, . . . , k+r}, determining the index i identifying said storage device; if the index i is equal to k+1: retrieving the secondary parity function f₁(a) stored in the parity storage device of index i=k+2 and the data blocks a_(i) of the subset S₁; computing the linear function g₁(a) associated with the subset S₁ and decoding the primary parity function f₀(a); and reconstructing the content of said parity device of index i=k+1; if the index i is greater than k+1: retrieving the primary parity function f₀(a) stored in the parity storage device of index i=k+1 and the data blocks a_(i) of the subset S_(i−k+1); computing the linear function g_(i−k+1)(a) associated with the subset S_(i−k+)1 and decoding the secondary parity function f_(i−k+1)(a); and reconstructing the content of said parity device of index i>k+1.
 9. The method according to claim 8, wherein the steps of retrieving the parity functions includes transferring the data from the corresponding storage devices to a data collecting device.
 10. The method according to claim 8, wherein the step of computing the linear function g_(i)(a), associated with the subset S_(j) for the reconstruction of a systematic node i such that i∈{1, . . . , k}, includes subtracting the primary parity function stored in the storage device N_(k+1) from the secondary parity function stored in the storage device N_(k+1+j).
 11. The method according to claim 8, wherein the step of computing the linear function g₁(a), associated with the subset S₁ for the reconstruction of a parity node i such that i=k+1, includes generating a linear combination from the data blocks of the subset S₁ and from the encoding vector C₁, and the step of computing the linear function g_(j)(a), associated with the subset S_(j) for the reconstruction of a parity node i such that i∈{k+2, . . . , k+r}, includes generating a linear combination from the data blocks of the subset S_(j) and from the encoding vector C_(j).
 12. The method according to claim 8, wherein the step of reconstructing the content of said systematic storage device includes decoding the data block a_(i) from the linear function g_(j)(a).
 13. The method according to claim 8, wherein the step of reconstructing the content of said parity storage device i such that i=k+1 includes decoding the secondary parity function of said parity device of index i=k+1 from the computed linear function g₁(a), and the step of reconstructing the content of said parity storage device i such that i>k+1 includes decoding the secondary parity function of said parity device of index i>k+1 from the computed linear function g_(i−k+1)(a).
 14. A device for reconstructing the content of a storage device out of a plurality of devices storing k data blocks a_(i) in k systematic storage devices N_(i), i∈{1, . . . , k}, and r parity functions f_(j)(a) in r parity storage devices N_(i), i∈{k+1, . . . , k+r}, said k data blocks and r parity functions being obtained according to a method for coding (k, r) data comprising: dividing an initial datum a into k data blocks ai of the same size; grouping the k data blocks into r−1 disconnected subsets Sj of data blocks; generating, for each subset Sj, a linear function gj(a) defined as a linear combination of the data blocks assigned to said subset Sj; and generating r parity functions comprising a primary parity function f0(a) as a linear combination of the k data blocks ai, and r−1 secondary parity functions, each secondary parity function fj(a) being defined as the sum of the primary parity function f0(a) and of a linear function gj(a); the device further comprising means suitable for implementing the steps of the method for reconstructing the content of a storage device according to claim
 8. 15. A computer program product comprising a program, said program comprising code instructions making it possible to perform a method for coding (k, r) data comprising: dividing an initial datum a into k data blocks ai of the same size; grouping the k data blocks into r−1 disconnected subsets Sj of data blocks; generating, for each subset Sj, a linear function gj (a) defined as a linear combination of the data blocks assigned to said subset Sj; and generating r parity functions comprising a primary parity function f0(a) as a linear combination of the k data blocks ai, and r−1 secondary parity functions, each secondary parity function fj(a) being defined as the sum of the primary parity function f0(a) and of a linear function gj(a); and/or comprising code instructions making it possible to perform the steps of the method according to claim 8 when said program is run on a computer.
 16. An integrated circuit of ASIC or FPGA type comprising at least one device according to claim
 7. 17. A use of the content reconstruction method according to claim 8 in a distributed storage system, said distributed storage system comprising a plurality of data storage nodes capable of storing data coded according to a method for coding (k, r) data comprising: dividing an initial datum a into k data blocks ai of the same size; grouping the k data blocks into r−1 disconnected subsets Sj of data blocks; generating, for each subset Sj, a linear function gj(a) defined as a linear combination of the data blocks assigned to said subset Sj; and generating r parity functions comprising a primary parity function f0(a) as a linear combination of the k data blocks ai, and r−1 secondary parity functions, each secondary parity function fj(a) being defined as the sum of the primary parity function f0(a) and of a linear function gj(a).
 18. The use of the content reconstruction method according to claim 8 in a local storage system, said local storage system comprising data storage discs capable of storing data coded according to a method for coding (k, r) data comprising: dividing an initial datum a into k data blocks ai of the same size; grouping the k data blocks into r−1 disconnected subsets Sj of data blocks; generating, for each subset Sj, a linear function gj(a) defined as a linear combination of the data blocks assigned to said subset Sj; and generating r parity functions comprising a primary parity function f0(a) as a linear combination of the k data blocks ai, and r−1 secondary parity functions, each secondary parity function fj(a) being defined as the sum of the primary parity function f0(a) and of a linear function gj (a).
 19. An integrated circuit of ASIC or FPGA type comprising at least one device according to claim
 14. 20. An integrated circuit of ASIC or FPGA type comprising at least one device according to claim 14, and at least one device for coding data comprising means suitable for implementing the steps of a method for coding (k, r) data comprising: dividing an initial datum a into k data blocks ai of the same size; grouping the k data blocks into r−1 disconnected subsets Sj of data blocks; generating, for each subset Sj, a linear function gj(a) defined as a linear combination of the data blocks assigned to said subset Sj; and generating r parity functions comprising a primary parity function f0(a) as a linear combination of the k data blocks ai, and r−1 secondary parity functions, each secondary parity function fj(a) being defined as the sum of the primary parity function f0(a) and of a linear function gj(a). 